Picture for Yupeng Cao

Yupeng Cao

Herculean: An Agentic Benchmark for Financial Intelligence

Add code
May 14, 2026
Viaarxiv icon

FinTrace: Holistic Trajectory-Level Evaluation of LLM Tool Calling for Long-Horizon Financial Tasks

Add code
Apr 11, 2026
Viaarxiv icon

Can LLM Agents Be CFOs? A Benchmark for Resource Allocation in Dynamic Enterprise Environments

Add code
Mar 24, 2026
Viaarxiv icon

MERMAID: Memory-Enhanced Retrieval and Reasoning with Multi-Agent Iterative Knowledge Grounding for Veracity Assessment

Add code
Jan 29, 2026
Viaarxiv icon

All That Glisters Is Not Gold: A Benchmark for Reference-Free Counterfactual Financial Misinformation Detection

Add code
Jan 08, 2026
Viaarxiv icon

FinCriticalED: A Visual Benchmark for Financial Fact-Level OCR Evaluation

Add code
Nov 19, 2025
Viaarxiv icon

Enhancing Scene Transition Awareness in Video Generation via Post-Training

Add code
Jul 24, 2025
Viaarxiv icon

Can AI Validate Science? Benchmarking LLMs for Accurate Scientific Claim $\rightarrow$ Evidence Reasoning

Add code
Jun 09, 2025
Viaarxiv icon

Truth Neurons

Add code
May 18, 2025
Viaarxiv icon

FinAudio: A Benchmark for Audio Large Language Models in Financial Applications

Add code
Mar 26, 2025
Viaarxiv icon